BayesMD: Flexible Biological Modeling for Motif Discovery
نویسندگان
چکیده
We present BayesMD, a Bayesian Motif Discovery model with several new features. Three different types of biological a priori knowledge are built into the framework in a modular fashion. A mixture of Dirichlets is used as prior over nucleotide probabilities in binding sites. It is trained on transcription factor (TF) databases in order to extract the typical properties of TF binding sites. In a similar fashion we train organism-specific priors for the background sequences. Lastly, we use a prior over the position of binding sites. This prior represents information complementary to the motif and background priors coming from conservation, local sequence complexity, nucleosome occupancy, etc. and assumptions about the number of occurrences. The Bayesian inference is carried out using a combination of exact marginalization (multinomial parameters) and sampling (over the position of sites). Robust sampling results are achieved using the advanced sampling method parallel tempering. In a post-analysis step candidate motifs with high marginal probability are found by searching among those motifs that contain sites that occur frequently. Thereby, maximum a posteriori inference for the motifs is avoided and the marginal probabilities can be used directly to assess the significance of the findings. The framework is benchmarked against other methods on a number of real and artificial data sets. The accompanying prediction server, documentation, software, models and data are available from http://bayesmd.binf.ku.dk/.
منابع مشابه
Motif discovery programs
BayesMD [1] is a probabilistic, Bayesian model for predicting novel transcription factor binding sites. Biological information about binding sites properties, background sequence models, occurrence and positional preferences are built into the model in modular fashion. Mixture prior parameters for the motif and background are trained using information on TFBSs and organismspecific promoter sequ...
متن کاملDevelopment of an Efficient Hybrid Method for Motif Discovery in DNA Sequences
This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...
متن کاملDesigning HMMs: Motif discovery and modeling
Position Specific Scoring Matrices capture the distribution of residues observed in each position in a conserved motif, but are not a good model for variable length motifs, recognition of new instances with insertions and deletions, and positional dependencies. Moreover, PSSMs can be used to search for instances of an ungapped motif in an unlabeled sequence, but do not lend themselves to precis...
متن کاملF3Dock: A Fast, Flexible and Fourier Based Approach to Protein-Protein Docking
Abstract Protein interactions, key to many biological processes, involves induced fit between flexible proteins which typically undergo conformational changes. Modeling this flexible protein-protein docking is an important step in drug discovery, structure determination and understanding structure-function relationships. In this paper, we present F3Dock, a Fast Flexible and Fourier based dockin...
متن کاملAn Evolutionary Model of DNA Substring Distribution
DNA sequence analysis methods, such as motif discovery, gene detection or phylogeny reconstruction, can often provide important input for biological studies. Many of such methods require a background model, representing the expected distribution of short substrings in a given DNA region. Most current techniques for modeling this distribution disregard the evolutionary processes underlying DNA f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of computational biology : a journal of computational molecular cell biology
دوره 15 10 شماره
صفحات -
تاریخ انتشار 2008